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EXAMPLE Examination 
Scalable Data Engineering 


STUDENT 


Name, First Name: 


Student ID: 
SCORE 
Question No. | Max. Score | Score Achieved 
1 8 
2 4 
3 12 
4 3 
5 3 
Total 30 


Grade: [ ] 


PLEASE NOTE 


e Please have your student ID and a document with picture available. They are 
controlled during the exam. 


You have 90 minutes to complete this exam. 


e Only use permanent ink (no pencil) in blue or black (no red or green). 


Use the empty spaces between the exam tasks to enter your answers. If this 
space is not enough, please use the back side of the task sheets. Additional sheets will 
not be accepted. Do not separate the sheets! 


e There are no aids allowed! 
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Question 1: Multidimensional Modeling (8 Points) 


The following schema represents a movie database. Movies are identified by their name and 
additional information like the production year, the production company and the director are 
available. 

movie title | kind | prod-year | season | episode | company | director | dir-status 


Matrix | Sci-Fi | 1999 = | Warner Bros. | The Wachowskis | active 


a) Completely normalize the given unity table into a new schema. (6 Points) 
Describe the resulting schema, including: 


e relations with their attributes and primary keys 


e foreign key relations 


b) Which type of multidimensional schema is the result of 1a). Explain your answer. 
(2 Points) 
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Question 2: Reporting Functions (4 Points) 


The following schema describes an order processing system. An order consists of lineitem 
elements which contain the quantity of a certain product of the order. The TPCH schema is 


known from lecture and exercise. 


Solve the following tasks using multidimensional queries. 


PART (P_) PARTSUPP (PS_) LINEITEM (L_) ORDERS (0_) 
SF*200,000 SF*800,000 SF*6,000,000 SF*1,500,000 
PARTKEY — | PARTKEY ORDERKEY ~e | ORDERKEY 
p | emer He 
SUPPLYCOST TOTALPRICE 
COMMENT ORDERDATE 
[sæ | customer sc.) 


SF*150,000 


CONTAINER 
RETAILPRICE 
COMMENT 


SUPPLIER (S_) 
SF*10,000 


RECEIPTDATE 


SHIPINSTRUCT 


a) List all parts (names) with their sold quantity and rank them in descending order. 
(4 points) 
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Question 3: Schema Matching (12 Points) 
Match the attributes of the following two schemas. Use the stable marriage algorithm with 
bigram similarity for scoring. 


S1 S2 


ssi a te [es [ean 


a) Find the bigram representation for each attribute. (3 Points) 


b) Apply the stable marriage algorithm. Use the bigram representations from 3a) as pri- 
mary and alphabetical order as secondary preference mechanism. $1 proposes to S2. 
Report the bigram scores (use the table below), the preference lists, and every step of 
the stable marriage algorithm. (9 Points) 
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Question 4: Time Series Data (3 Points) 
Given is a time series with season length of s = 7 and a linear trend component. t marks the 
time and y marks the time series values. 
a E E ce a 
y | 8 [10 | u | 122 | 16 | 17 | 2 | 2% | 23 |] 


a) Extract the trend component of the given time series using classical decomposition. 
(3 Points) 
o e e e (Z| 
trend | | | | | | | | | | 
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Question 5: Word Embedding Training (3 Points) 
The following context is given: 


e context window size: 6 (3 words before + 3 words after target) 
e Word Embedding dimensionality: 10 
e hidden layer size: 10 


e total number of words in vocabulary: 500 


Apply the naive training algorithm to train a word-embedding model. After the embedding 
lookup and reshaping, we perform therefore a classification for which we use the neural net- 
work depicted below (first layer(left) is input , third layer(right) is output). Please note, that 
activation functions are not shown in the figure. 


1st 2nd 3rd 


After reshaping, how many neurons are necessary in each layer? (3 points) 


